Anomaly Detecting Within Dynamic Chinese Chat Text

نویسندگان

  • Yunqing Xia
  • Kam-Fai Wong
چکیده

The problem in processing Chinese chat text originates from the anomalous characteristics and dynamic nature of such a text genre. That is, it uses ill-edited terms and anomalous writing styles in chat text, and the anomaly is created and discarded very quickly. To handle this problem, one solution is to re-train the recognizer periodically. This costs a lot of manpower in producing the timely chat text corpus. The new approaches are proposed in this paper to detect the anomaly within dynamic Chinese chat text by incorporating standard Chinese corpora and chat corpus. We first model standard language text using standard Chinese corpora and apply these models to detect anomalous chat text. To improve detection quality, we construct anomalous chat language model using one static chat text corpus and incorporate this model into the standard language models. Our approaches calculate confidence and entropy for the input text and apply threshold values to help make the decisions. The experiments prove that performance equivalent to the best ones produced by the approaches in existence can be achieved stably with our approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Phonetic-Based Approach to Chinese Chat Text Normalization

Chatting is a popular communication media on the Internet via ICQ, chat rooms, etc. Chat language is different from natural language due to its anomalous and dynamic natures, which renders conventional NLP tools inapplicable. The dynamic problem is enormously troublesome because it makes static chat language corpus outdated quickly in representing contemporary chat language. To address the dyna...

متن کامل

Introspective Study of Emotion Icon in Public Chat as a Gesture of Texting

An emotion icon, better known as emoticon is a metacommunicative pictorial representation of a facial expression that, in the absence of body language and prosody, serves to draw a receiver's attention to the tenor or temper of a sender's nominal verbal communication, changing and improving its interpretation. The present study investigates the use of these nonverbal cues in whatsapp public cha...

متن کامل

Constructing A Chinese Chat Language Corpus with A Two-Stage Incremental Annotation Approach

Chat language refers to the special human language widely used in the community of digital network chat. As chat language holds anomalous characteristics in forming words, phrases, and non-alphabetical characters, conventional natural language processing tools are ineffective to handle chat language text. Previous research shows that knowledge based methods perform less effectively in processin...

متن کامل

Conversation Extraction in Dynamic Text Message Stream

Text message stream which is produced by Instant Messager and Internet Relay Chat poses interesting and challenging problems for information technologies. It is beneficial to extract the conversations in this kind of chatting message stream for information management and knowledge finding. However, the data in text message stream are usually very short and incomplete, and it requires efficiency...

متن کامل

Chinese Novelty Mining

Automated mining of novel documents or sentences from chronologically ordered documents or sentences is an open challenge in text mining. In this paper, we describe the preprocessing techniques for detecting novel Chinese text and discuss the influence of different Part of Speech (POS) filtering rules on the detection performance. Experimental results on APWSJ and TREC 2004 Novelty Track data s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006